Gobierno y Decisiones Públicas
Profesor: Dr. José Manuel Magallanes, PhD
Una mirada a la data del Mininter¶
Data proveniente de observatorio

Qué se tiene?¶
In [1]:
import geopandas as gpd
PuntosMininter=gpd.read_file("https://github.com/SocialAnalytics-StrategicIntelligence/pnpData/raw/main/features10000.geojson")
PuntosMininter.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 direccion 10000 non-null object 1 id_dpto 10000 non-null object 2 departamento 10000 non-null object 3 id_prov 10000 non-null object 4 provincia 10000 non-null object 5 id_dist 10000 non-null object 6 distrito 10000 non-null object 7 cod_cpnp 9999 non-null object 8 comisaria 10000 non-null object 9 id_materia 10000 non-null int64 10 materia 10000 non-null object 11 id_tipo 10000 non-null int64 12 tipo 10000 non-null object 13 id_subtipo 10000 non-null int64 14 subtipo 10000 non-null object 15 id_modalidad 10000 non-null int64 16 modalidad 10000 non-null object 17 anio_hecho 10000 non-null int64 18 mes_hecho 10000 non-null int64 19 dia_hecho 10000 non-null int64 20 turno_hecho 10000 non-null object 21 time 10000 non-null object 22 estado 10000 non-null int64 23 geometry 10000 non-null geometry dtypes: geometry(1), int64(8), object(15) memory usage: 1.8+ MB
Qué grupos se quiere analizar?¶
- Periodo?
- Lugar?
In [2]:
# periodo
PuntosMininter.anio_hecho.value_counts()
Out[2]:
anio_hecho 2018 9998 2020 1 2019 1 Name: count, dtype: int64
In [3]:
# provincias
PuntosMininter.provincia.value_counts()
Out[3]:
provincia
LIMA 6190
CALLAO 753
AREQUIPA 611
PIURA 430
TRUJILLO 419
...
PISCO 1
YAULI 1
JAEN 1
AYABACA 1
LORETO 1
Name: count, Length: 70, dtype: int64
La alternativa es Lima 2018:
In [4]:
delitosLima2018=PuntosMininter[(PuntosMininter.provincia=='LIMA') & (PuntosMininter.anio_hecho==2018)]
Explorando Lima 2018¶
Reportes por comisaría:
In [5]:
import pandas as pd
pd.DataFrame(delitosLima2018.comisaria.value_counts()).reset_index(drop=False)
Out[5]:
| comisaria | count | |
|---|---|---|
| 0 | CPNP CANTO REY | 363 |
| 1 | CPNP VILLA EL SALVADOR | 247 |
| 2 | CPNP SOL DE ORO | 244 |
| 3 | CPNP LAURA CALLER IBERICO | 218 |
| 4 | CPNP VILLA MARIA DEL TRIUNFO | 209 |
| ... | ... | ... |
| 105 | CPNP LA UNIFICADA | 5 |
| 106 | CPNP LAS PRADERAS | 4 |
| 107 | CPNP SAN BARTOLO | 4 |
| 108 | CPNP JICAMARCA | 3 |
| 109 | CPNP MARANGA | 1 |
110 rows × 2 columns
dónde enfocarse?¶
In [6]:
delitosLima2018.comisaria.value_counts().describe()
Out[6]:
count 110.000000 mean 56.254545 std 63.067524 min 1.000000 25% 17.000000 50% 35.000000 75% 66.000000 max 363.000000 Name: count, dtype: float64
In [7]:
# valor de corte?
import matplotlib.pyplot as plt
ax=delitosLima2018.comisaria.value_counts().plot(kind='barh')
new_var = ax.tick_params(axis='y', labelsize=5)
new_var
Punto de corte: mediana
In [8]:
seleccionComisarias=delitosLima2018.comisaria.value_counts()[delitosLima2018.comisaria.value_counts()>=35].index
seleccionComisarias
Out[8]:
Index(['CPNP CANTO REY', 'CPNP VILLA EL SALVADOR', 'CPNP SOL DE ORO',
'CPNP LAURA CALLER IBERICO', 'CPNP VILLA MARIA DEL TRIUNFO',
'CPNP SANTA ANITA', 'CPNP LA PASCANA', 'CPNP ALFONSO UGARTE',
'CPNP SANTA CLARA', 'CPNP APOLO', 'CPNP MATEO PUMACAHUA', 'CPNP ZARATE',
'CPNP SURCO', 'CPNP SAN MARTIN DE PORRES', 'CPNP SANTOYO',
'CPNP PAMPLONA I', 'CPNP INDEPENDENCIA', 'CPNP JESUS MARIA',
'CPNP SAN BORJA', 'CPNP PUENTE PIEDRA', 'CPNP SAGITARIO',
'CPNP LA HUAYRONA', 'CPNP SURQUILLO', 'CPNP EL PROGRESO',
'CPNP HUACHIPA', 'CPNP BREÑA', 'CPNP EL MANZANO', 'CPNP MIRAFLORES',
'CPNP BARBONCITOS', 'CPNP SANTA ELIZABETH', 'CPNP MAGDALENA',
'CPNP VILLA', 'CPNP PUEBLO LIBRE', 'CPNP SAN LUIS', 'CPNP MONSERRATE',
'CPNP LA VICTORIA', 'CPNP BARRANCO', 'CPNP SAN ISIDRO',
'CPNP MONTERRICO', 'CPNP CHACRA COLORADA', 'CPNP PRO',
'CPNP URBANIZACION PACHACAMAC', 'CPNP CONDEVILLA', 'CPNP PACHACAMAC',
'CPNP HUAYCAN', 'CPNP SAN JUAN DE MIRAFLORES', 'CPNP VILLA ALEJANDRO',
'CPNP PALOMINO', 'CPNP TUPAC AMARU', 'CPNP UNIDAD VECINAL MIRONES ALTO',
'CPNP LA MOLINA', 'CPNP SAN ANDRES', 'CPNP LURIN', 'CPNP CAJA DE AGUA',
'CPNP LINCE', 'CPNP TAHUANTINSUYO',
'CPNP SAN FRANCISCO TABLADA DE LURIN'],
dtype='object', name='comisaria')
Enfocaremos:
In [9]:
DelitosLima2018_peores=delitosLima2018[delitosLima2018.comisaria.isin(seleccionComisarias)].copy().reset_index(drop=True)
DelitosLima2018_peores
Out[9]:
| direccion | id_dpto | departamento | id_prov | provincia | id_dist | distrito | cod_cpnp | comisaria | id_materia | ... | subtipo | id_modalidad | modalidad | anio_hecho | mes_hecho | dia_hecho | turno_hecho | time | estado | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | JR. PROGRESO MZ- 15 LT-03 ?PACHACAMAC | 15 | LIMA | 1501 | LIMA | 150123 | PACHACAMAC | 2302 | CPNP PACHACAMAC | 1 | ... | DAÑOS | 1051001 | DAÑO SIMPLE | 2018 | 1 | 1 | tarde | 1/1/2018, 12:35 PM | 1 | POINT (-76.85853 -12.23112) |
| 1 | AV. REVOLUCION Y LA AV. 03 DE OCTUBRE | 15 | LIMA | 1501 | LIMA | 150142 | VILLA EL SALVADOR | 2270 | CPNP VILLA EL SALVADOR | 1 | ... | PELIGRO COMUN | 1120102 | CONDUCCION EN ESTADO DE EBRIEDAD O DROGADICCION | 2018 | 1 | 1 | mañana | 1/1/2018, 6:40 AM | 1 | POINT (-76.93689 -12.21293) |
| 2 | AV. LOS ALAMOS CON AV. EL SOL SAN JUAN DE LURI... | 15 | LIMA | 1501 | LIMA | 150132 | SAN JUAN DE LURIGANCHO | 2090 | CPNP CANTO REY | 1 | ... | PELIGRO COMUN | 1120101 | PELIGRO POR MEDIO DE INCENDIO O EXPLOSION | 2018 | 1 | 1 | madrugada | 1/1/2018, 5:10 AM | 1 | POINT (-76.99931 -11.98653) |
| 3 | av. dante cuadra 04 | 15 | LIMA | 1501 | LIMA | 150141 | SURQUILLO | 2190 | CPNP SURQUILLO | 1 | ... | LESIONES | 1010311 | LESIONES LEVES | 2018 | 1 | 1 | madrugada | 1/1/2018, 5:20 AM | 1 | POINT (-77.01211 -12.11003) |
| 4 | JIRON ARGENTINA Y JIRON COLOMBIA CHOSICA | 15 | LIMA | 1501 | LIMA | 150118 | LURIGANCHO | 2170 | CPNP HUACHIPA | 1 | ... | PELIGRO COMUN | 1120102 | CONDUCCION EN ESTADO DE EBRIEDAD O DROGADICCION | 2018 | 1 | 1 | mañana | 1/1/2018, 8:00 AM | 1 | POINT (-76.91800 -12.00326) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5315 | OVALO MIRAFLORES | 15 | LIMA | 1501 | LIMA | 150122 | MIRAFLORES | 2215 | CPNP MIRAFLORES | 1 | ... | HURTO | 1050101 | HURTO | 2018 | 1 | 21 | noche | 1/21/2018, 7:30 PM | 1 | POINT (-77.02908 -12.11921) |
| 5316 | LAS GAVIOTAS MZ J2 LOTE 04 HUACHIPA | 15 | LIMA | 1501 | LIMA | 150118 | LURIGANCHO | 2170 | CPNP HUACHIPA | 1 | ... | ROBO | 1050215 | ROBO AGRAVADO | 2018 | 1 | 22 | mañana | 1/22/2018, 11:30 AM | 1 | POINT (-76.95009 -12.01783) |
| 5317 | AV. HIPÓLITO UNANUE Y PROLONGACIÓN CANGALLO | 15 | LIMA | 1501 | LIMA | 150115 | LA VICTORIA | 2030 | CPNP LA VICTORIA | 1 | ... | VIOLACION DE LA LIBERTAD PERSONAL | 1040101 | COACCION | 2018 | 1 | 20 | mañana | 1/20/2018, 10:00 AM | 1 | POINT (-77.02070 -12.06614) |
| 5318 | Calle José Galvez 550, Miraflores 15074, Perú | 15 | LIMA | 1501 | LIMA | 150122 | MIRAFLORES | 2215 | CPNP MIRAFLORES | 1 | ... | HURTO | 1050107 | HURTO AGRAVADO | 2018 | 1 | 18 | tarde | 1/18/2018, 4:00 PM | 1 | POINT (-77.03506 -12.12107) |
| 5319 | Calle 4 MZ i LT 10, Santa Anita 15011, Perú | 15 | LIMA | 1501 | LIMA | 150137 | SANTA ANITA | 2120 | CPNP SANTA ANITA | 1 | ... | HURTO | 1050119 | HURTO DE VEHICULO | 2018 | 1 | 22 | madrugada | 1/22/2018, 2:00 AM | 1 | POINT (-76.95238 -12.03550) |
5320 rows × 24 columns
Mirada global espacial¶
In [10]:
#pip install mapclassify
In [11]:
DelitosLima2018_peores.explore()
Out[11]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Si se necesita los limites distritales:
In [2]:
limitesLima=gpd.read_file("https://github.com/SocialAnalytics-StrategicIntelligence/pnpData/raw/main/LimaDelitos.gpkg", layer='poligonos')
limitesLima.plot()
Out[2]:
<Axes: >
Capa previa:
In [13]:
import folium
base=limitesLima.explore(name="Limites",color='grey',style_kwds={'fill':False})
m = DelitosLima2018_peores.explore(m=base, name="Points")
# this is completely optional
folium.LayerControl().add_to(m)
m
Out[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Exploracion detallada: hora del delito¶
In [14]:
DelitosLima2018_peores.turno_hecho.value_counts()
Out[14]:
turno_hecho noche 1834 tarde 1332 mañana 1199 madrugada 955 Name: count, dtype: int64
In [15]:
DelitosLima2018_peores.turno_hecho.value_counts().plot(kind='bar')
Out[15]:
<Axes: xlabel='turno_hecho'>
Cuando valores son similares, suele no verse patrones:
In [16]:
import folium
base=limitesLima.explore(name="Limites",color='grey',style_kwds={'fill':False},tiles='CartoDB positron')
m = DelitosLima2018_peores.explore(m=base, name="Points",
column="turno_hecho",
categorical=True,
cmap=['blue','green','orange','black'])
folium.LayerControl().add_to(m)
m
Out[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Veamos tabla de doble entrada:
In [17]:
# columns suma 100%
pd.crosstab(DelitosLima2018_peores['distrito'],DelitosLima2018_peores['turno_hecho'],normalize='columns',margins=True)*100
Out[17]:
| turno_hecho | madrugada | mañana | noche | tarde | All |
|---|---|---|---|---|---|
| distrito | |||||
| ATE | 4.816754 | 3.586322 | 3.598691 | 4.354354 | 4.003759 |
| BARRANCO | 1.675393 | 0.750626 | 0.654308 | 1.051051 | 0.958647 |
| BREÑA | 1.256545 | 1.834862 | 2.290076 | 3.003003 | 2.180451 |
| CARABAYLLO | 2.094241 | 1.584654 | 1.308615 | 1.951952 | 1.672932 |
| CHORRILLOS | 3.036649 | 3.669725 | 3.162486 | 4.654655 | 3.627820 |
| COMAS | 5.549738 | 4.587156 | 4.362050 | 3.078078 | 4.304511 |
| EL AGUSTINO | 2.617801 | 2.085071 | 2.181025 | 1.951952 | 2.180451 |
| INDEPENDENCIA | 3.036649 | 2.251877 | 3.707743 | 1.576577 | 2.725564 |
| JESUS MARIA | 0.837696 | 1.751460 | 2.290076 | 2.552553 | 1.973684 |
| LA MOLINA | 0.732984 | 0.667223 | 0.654308 | 0.675676 | 0.676692 |
| LA VICTORIA | 2.198953 | 4.170142 | 3.925845 | 5.180180 | 3.984962 |
| LIMA | 4.712042 | 7.422852 | 6.324973 | 7.582583 | 6.597744 |
| LINCE | 1.047120 | 0.583820 | 0.490731 | 0.750751 | 0.676692 |
| LOS OLIVOS | 10.157068 | 5.504587 | 7.251908 | 5.105105 | 6.842105 |
| LURIGANCHO | 1.989529 | 1.251043 | 1.254089 | 1.801802 | 1.522556 |
| LURIN | 1.256545 | 1.000834 | 1.254089 | 1.801802 | 1.334586 |
| MAGDALENA DEL MAR | 0.837696 | 1.084237 | 1.417666 | 0.750751 | 1.071429 |
| MIRAFLORES | 0.628272 | 1.501251 | 0.981461 | 1.801802 | 1.240602 |
| PACHACAMAC | 0.732984 | 0.417014 | 1.254089 | 0.675676 | 0.827068 |
| PUEBLO LIBRE | 1.256545 | 1.000834 | 0.981461 | 0.975976 | 1.033835 |
| PUENTE PIEDRA | 1.989529 | 1.751460 | 1.962923 | 1.876877 | 1.898496 |
| RIMAC | 1.361257 | 1.751460 | 1.035987 | 1.126126 | 1.278195 |
| SAN BORJA | 1.047120 | 2.168474 | 1.908397 | 2.402402 | 1.936090 |
| SAN ISIDRO | 0.209424 | 1.251043 | 0.981461 | 1.126126 | 0.939850 |
| SAN JUAN DE LURIGANCHO | 13.193717 | 14.762302 | 11.832061 | 12.387387 | 12.875940 |
| SAN JUAN DE MIRAFLORES | 3.141361 | 3.753128 | 2.181025 | 3.003003 | 2.913534 |
| SAN LUIS | 1.465969 | 1.000834 | 0.708833 | 1.051051 | 0.996241 |
| SAN MARTIN DE PORRES | 6.701571 | 7.005838 | 7.306434 | 6.531532 | 6.936090 |
| SANTA ANITA | 3.769634 | 4.086739 | 4.089422 | 3.378378 | 3.853383 |
| SANTIAGO DE SURCO | 3.979058 | 4.587156 | 5.725191 | 4.879880 | 4.943609 |
| SURQUILLO | 1.884817 | 1.334445 | 1.908397 | 2.027027 | 1.804511 |
| VILLA EL SALVADOR | 5.130890 | 5.504587 | 6.324973 | 4.954955 | 5.582707 |
| VILLA MARIA DEL TRIUNFO | 5.654450 | 4.336947 | 4.689204 | 3.978979 | 4.605263 |
Hay valor atípico:
In [18]:
# normalize all
import seaborn as sns
delitos_margins=pd.crosstab(DelitosLima2018_peores['distrito'],DelitosLima2018_peores['turno_hecho'],normalize='all',margins=False)*100
ax=sns.clustermap(delitos_margins,method='ward', cmap='coolwarm', yticklabels=True,annot=True)
ax.tick_params(axis='y', labelsize=6)
Out[18]:
<seaborn.matrix.ClusterGrid at 0x169018810>
Saquemos al atípico.
Qué deseamos?: agrupar distritos
Como es el delito en el distrito a lo largo del dia? a. Según los conteos del ditrito
In [19]:
DelitosLima2018_peores_sinSJL=DelitosLima2018_peores[DelitosLima2018_peores['distrito']!='SAN JUAN DE LURIGANCHO']
delitos_margins=pd.crosstab(DelitosLima2018_peores_sinSJL['distrito'],DelitosLima2018_peores_sinSJL['turno_hecho'],normalize='index')*100
ax=sns.clustermap(delitos_margins, method='ward',cmap='coolwarm', yticklabels=True,annot=True)
ax.tick_params(axis='y', labelsize=6)
Out[19]:
<seaborn.matrix.ClusterGrid at 0x16aabf890>
b. Según los conteos de en el turno del día
In [20]:
delitos_margins=pd.crosstab(DelitosLima2018_peores_sinSJL['distrito'],DelitosLima2018_peores_sinSJL['turno_hecho'],normalize='columns')*100
ax=sns.clustermap(delitos_margins, method='ward', cmap='coolwarm', yticklabels=True,annot=True)
ax.tick_params(axis='y', labelsize=6)
Out[20]:
<seaborn.matrix.ClusterGrid at 0x16ab41f10>
Exploración detallada: tipo de delito¶
Afecta el atípico?
In [21]:
DelitosLima2018_peores.subtipo.value_counts().head(10)
Out[21]:
subtipo HURTO 2357 ROBO 1793 LESIONES 355 PELIGRO COMUN 290 VIOLACION DE LA LIBERTAD SEXUAL 153 ESTAFA Y OTRAS DEFRAUDACIONES 89 SALUD PUBLICA 38 COMETIDOS POR PARTICULARES 36 USURPACION 32 VIOLACION DE LA LIBERTAD PERSONAL 23 Name: count, dtype: int64
In [22]:
DelitosLima2018_peores_sinSJL.subtipo.value_counts().head(10)
Out[22]:
subtipo HURTO 2084 ROBO 1518 LESIONES 319 PELIGRO COMUN 253 VIOLACION DE LA LIBERTAD SEXUAL 139 ESTAFA Y OTRAS DEFRAUDACIONES 80 SALUD PUBLICA 34 COMETIDOS POR PARTICULARES 29 USURPACION 24 DAÑOS 19 Name: count, dtype: int64
In [23]:
tiposParaAnalisis=DelitosLima2018_peores_sinSJL.subtipo.value_counts().head(5).index
tiposParaAnalisis
Out[23]:
Index(['HURTO', 'ROBO', 'LESIONES', 'PELIGRO COMUN',
'VIOLACION DE LA LIBERTAD SEXUAL'],
dtype='object', name='subtipo')
In [24]:
DelitosLima2018_peoresTipos=DelitosLima2018_peores_sinSJL[DelitosLima2018_peores_sinSJL.subtipo.isin(tiposParaAnalisis)]
delitosTipos_margins=pd.crosstab(DelitosLima2018_peoresTipos['distrito'],DelitosLima2018_peoresTipos['subtipo'],normalize='columns')*100
ax=sns.clustermap(delitosTipos_margins, method='ward',cmap='coolwarm', yticklabels=True,annot=True)
ax.tick_params(axis='y', labelsize=6)
Out[24]:
<seaborn.matrix.ClusterGrid at 0x15e142890>
In [25]:
centroidLima=DelitosLima2018_peoresTipos.dissolve().centroid
/var/folders/2n/bkfhfqq16r78g3hf7pdj56y40000gn/T/ipykernel_19423/53041390.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. centroidLima=DelitosLima2018_peoresTipos.dissolve().centroid
In [26]:
import folium
from folium import plugins
y=centroidLima.y[0]
x=centroidLima.x[0]
map = folium.Map(location=[y, x], tiles="CartoDB positron", zoom_start=10)
#qué delito?
Cuales=DelitosLima2018_peoresTipos[DelitosLima2018_peoresTipos.subtipo.isin(['HURTO','ROBO'])]
heat_data = [[point.xy[1][0], point.xy[0][0]] for point in Cuales.geometry]
plugins.HeatMap(heat_data).add_to(map)
map
Out[26]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [27]:
map = folium.Map(location=[y, x], tiles="CartoDB positron", zoom_start=10)
#qué delito?
Cuales=DelitosLima2018_peoresTipos[DelitosLima2018_peoresTipos.subtipo.isin(['VIOLACION DE LA LIBERTAD SEXUAL'])]
heat_data = [[point.xy[1][0], point.xy[0][0]] for point in Cuales.geometry]
plugins.HeatMap(heat_data).add_to(map)
map
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [28]:
m = folium.Map(location=[y, x], tiles="CartoDB positron", zoom_start=10)
plugins.MarkerCluster(heat_data).add_to(m)
m
Out[28]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [29]:
# limitesLima.to_file("LimaDelitos.gpkg", layer='poligonos', driver="GPKG")
# delitosLima2018.to_file("LimaDelitos.gpkg", layer='puntos2018', driver="GPKG")
# import fiona
# gpkg = 'LimaDelitos.gpkg'
# layers = fiona.listlayers('LimaDelitos.gpkg')
# layers